Skip to content

Add tag-based key discovery support in the aws_kms KeyManager plugin#7006

Open
amartinezfayo wants to merge 2 commits into
spiffe:mainfrom
amartinezfayo:aws-kms-tag-key-discovery
Open

Add tag-based key discovery support in the aws_kms KeyManager plugin#7006
amartinezfayo wants to merge 2 commits into
spiffe:mainfrom
amartinezfayo:aws-kms-tag-key-discovery

Conversation

@amartinezfayo

@amartinezfayo amartinezfayo commented May 30, 2026

Copy link
Copy Markdown
Member

The aws_kms KeyManager plugin currently discovers its keys by listing every KMS alias and calling DescribeKey across the account's keys. This is not the most efficient way to discover SPIRE-managed keys, and it becomes problematic in deployments with a large number of keys in AWS KMS, where it is slow and requires broad kms:ListKeys and kms:DescribeKey permissions.

This PR adds an opt-in tag-based key discovery mode that uses the AWS Resource Groups Tagging API (tag:GetResources) to find only the keys managed by the plugin, identified by SPIRE-specific tags. This mirrors the label-based approach already used by the gcp_kms plugin.

Changes:

  • New enable_tag_based_key_discovery configuration option (boolean, default false).
  • When enabled, keys are discovered via the Resource Groups Tagging API filtered by SPIRE tags (spire-server-td, spire-server-id, spire-active) and looked up by spire-key-id.
  • Keys are stamped with spire-last-update at creation and during migration, and the value is refreshed periodically (every 6h). Stale keys (no refresh for two weeks) are marked inactive and scheduled for deletion, mirroring the alias-based liveness and reclamation contract.
  • Transparent migration: on first startup with the flag enabled, the plugin detects existing alias-managed keys and applies the SPIRE tags to them automatically. No manual steps are required.
  • Aliases are still created and refreshed in tag-based mode so that a downgrade to alias-based discovery remains safe.
  • When tag-based discovery is enabled, a missing tag:GetResources permission now produces an actionable error explaining that the permission is identity-based and cannot be granted through the KMS key policy, instead of a raw AWS error.

Permissions:

  • kms:TagResource is required when tag-based discovery is enabled (or when using key_tags).
  • tag:GetResources is required when tag-based discovery is enabled. It belongs to the Resource Groups Tagging API and must be granted in an identity-based IAM policy.
  • kms:ListKeys is only required for the legacy alias-based discovery path.

Deprecation:
Alias-based key discovery will be deprecated in a future version and removed in a later one. The default remains alias-based for now, and a warning is logged when it is in use.

HA and operational note:
Within a trust domain, enable_tag_based_key_discovery should be configured consistently across all servers. The two discovery modes use independent liveness signals (alias LastUpdatedDate vs the spire-last-update tag), so a partial rollback sustained beyond the two-week window can cause tag-based peers to reclaim a rolled-back server's still-in-use keys. This is documented in the plugin docs.

Documentation:
Updated doc/plugin_server_keymanager_aws_kms.md with the new option, the tag schema, the migration behavior, the permission split, the deprecation note, and the HA configuration-consistency guidance.

Fixes #6269

Signed-off-by: Agustín Martínez Fayó <amartinezfayo@gmail.com>
@amartinezfayo amartinezfayo force-pushed the aws-kms-tag-key-discovery branch from 9efe60e to bc249b3 Compare May 31, 2026 01:32
@amartinezfayo amartinezfayo marked this pull request as ready for review May 31, 2026 02:29
@amartinezfayo amartinezfayo requested a review from evan2645 as a code owner May 31, 2026 02:29
Copilot AI review requested due to automatic review settings May 31, 2026 02:29

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an opt-in, tag-based key discovery mode to the aws_kms KeyManager plugin to avoid scanning all KMS keys/aliases in an account, reducing required permissions and improving performance for large KMS inventories. It introduces discovery and reclamation logic based on SPIRE-managed KMS resource tags, with automatic migration for legacy (alias-managed) keys and documentation updates.

Changes:

  • Add enable_tag_based_key_discovery configuration and wire in AWS Resource Groups Tagging API (tag:GetResources) for key discovery.
  • Implement tag-based key liveness (spire-last-update) refresh and stale-key disposal (two-week threshold), with transparent startup migration from alias-managed keys.
  • Update tests, fake clients, and documentation; add resourcegroupstaggingapi AWS SDK dependency.

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
pkg/server/plugin/keymanager/awskms/fetcher.go Adds tag-based key discovery and migration logic using the Resource Groups Tagging API.
pkg/server/plugin/keymanager/awskms/client.go Introduces a tagging API client interface and constructor.
pkg/server/plugin/keymanager/awskms/client_fake.go Extends fakes to support Tagging API and KMS TagResource call recording for tests.
pkg/server/plugin/keymanager/awskms/awskms.go Adds config flag, SPIRE tag schema/constants, tag-based background tasks (keepalive + disposal), and creation-time tag stamping.
pkg/server/plugin/keymanager/awskms/awskms_test.go Adds coverage for tag-based configure/migration, creation-time tag stamping, keepalive tagging, and disposal via tags.
go.mod Bumps aws-sdk-go-v2 and adds resourcegroupstaggingapi dependency.
go.sum Updates dependency checksums accordingly.
doc/plugin_server_keymanager_aws_kms.md Documents tag-based discovery behavior, permissions split, HA considerations, and config examples.

Comment on lines +212 to +215
// Trigger a goroutine to get the details of the key
g.Go(func() error {
entry, err := kf.fetchKeyEntryDetailsFromArn(ctx, keyArn, spireKeyID)
if err != nil {

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keyArn and spireKeyID are declared inside the loop body, so each goroutine captures its own copy (and the range var is per-iteration on Go 1.22+ anyway). No race here.

Comment on lines +3106 to +3109
// Wait for expected deletions to be processed
for range tt.expectDeleteCount {
_ = waitForSignal(t, deleteSignal)
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Range-over-int is valid since Go 1.22 (we are on 1.26 in SPIRE).

Comment on lines +1209 to +1211
// Note: spire-last-update is intentionally omitted here. It is set exclusively
// by keepActiveKeys, which runs on a regular schedule.
func (p *Plugin) buildSPIRETags(serverID, trustDomain string) []types.Tag {

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment on lines +143 to +145
// Note: When enabled, the plugin requires permission to use the
// resourcegroupstaggingapi:GetResources API action.
EnableTagBasedKeyDiscovery bool `hcl:"enable_tag_based_key_discovery" json:"enable_tag_based_key_discovery"`

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Signed-off-by: Agustín Martínez Fayó <amartinezfayo@gmail.com>
Comment on lines +46 to +47
tagKeyServerTD = "spire-server-td" // Trust domain (no hashing needed - AWS allows dots and long values)
tagKeyServerID = "spire-server-id" // Server identifier

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double checking my assumptions: I see that AWS KMS has a limit on the tag key to be 128 chars and the value to be 256 chars. This should be ok for both of these since the trust domain should be a maximum of 256 chars and the server ID we require to be a UUIDv4, right?

Comment on lines +323 to +328
case useTagBasedDiscovery && len(newConfig.KeyTags) > 0:
userTags := buildKeyTags(newConfig.KeyTags)
// Build a fresh slice to avoid mutating the spireTags backing array.
p.keyTags = make([]types.Tag, 0, len(spireTags)+len(userTags))
p.keyTags = append(p.keyTags, spireTags...)
p.keyTags = append(p.keyTags, userTags...)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check that there are no duplicate tags between spireTags and userTags? I don't know if it matters to the API, but the users might want to know.

Tags: []types.Tag{
{
TagKey: aws.String(tagKeyActive),
TagValue: aws.String("false"),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's possible for the key to be enqueued for deletion and for us to mark the key as inactive, but for the key to not actually be deleted, for example if the key was deleting some other key and then shuts down.

Do we actually need to do the ScheduleKeyDeletion call asynchronously?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AWS KMS Plugin - Code scan all the KMS keys in the account after every 48 hours

3 participants